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Abstract. Recent researches have discovered that rich interactions among 
entities in nature and society bring about complex networks with com- 
munity structures. Although the investigation of the community struc- 
tures has promoted the development of many successful algorithms, most 
of them only find separated communities, while for the vast majority of 
real-world networks, communities actually overlap to some extent. More- 
over, the vertices of networks can often belong to different domains as 
well. Therefore, in this paper, we propose a novel algorithm BiTector 
(Bi-community Detector) to efficiently mine overlapping communities 
in large-scale sparse bipartite networks. It only depends on the network 
topology, and does not require any priori knowledge about the num- 
ber or the original partition of the network. We apply the algorithm to 
real-world data from different domains, showing that BiTector can suc- 
cessfully identifies the overlapping community structures of the bipartite 
networks. 
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1 Introduction 

In recent years, people have found that both of the physical systems in nature 
and the engineered artifacts in human society can be modeled as complex net- 
works[l][2], such as the internet, the World Wide Web, social networks, citation 
networks and etc. Although these systems come from very different domains, 
they all have the community structure [3] [4] in common, that is, they have 
vertices in a group structure that vertices within the groups have higher density 
of edges while vertices among groups have lower density of edges. 

The existence of the community structures has important practical signifi- 
cance. For example, the communities in World Wide Web correspond to topics 
of interest. In social networks, individuals belong to the same community are 
probable to have properties in common. Nowadays, community information is 
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considered to be used for improving the search engine to provide better person- 
aUzed results. Moreover, the information diffusion and spreading mechanism in 
a network can be affected and determined by the community structures. Hence, 
identifying the communities is a fundamental step not only for discovering what 
makes entities come together, but also for understanding the overall structural 
and functional properties of the whole network. As a result, a wide range of suc- 
cessful algorithms [5] have been proposed to discover the community structures. 
These methods assume that communities are separated, placing eac;li vertex in 
only one community. They do not take into account the possible overlappings[6] 
among communities in the real-world scenarios, such as that each of us may 
participate in many social cycles according to our various hobbies. 

Moreover, the specific types of the vertices may not belong to the same do- 
main as well, bringing about a bipartite network structure. For example, in the 
scientific collaboration network [7], two different types of nodes represent the 
authors and papers respectively; in the movie-actor network[8], each actor is 
connected to the films where he or she has starred; in the collaborative recom- 
mendation network[9], the edges link each customer to the corresponding rated 
or tagged pages, pictures, videos and other products. In addition, many biolog- 
ical networks are naturally bipartite, such as the protein interaction network 
from yeast [10], where the two types of nodes are bait proteins and prey proteins, 
and the human disease network of diseases and their pathogenicity genes[ll]. 

Traditionally, the studies of the bipartite networks usually depend on the 
one-mode projection of the original network into two unipartite networks. More 
specifically, given a bipartite network G(u,i)j where U and I are the sets of the 
two different types of nodes, the one-mode projection converts Q(u,i) into Qu 
and Qi respectively. The adjacency matrices of Qu and Gi are built such that 

,^ , f 1 if vertex i and j have a common neighbor if. G I in G(u i) 
^iji^c/j-jo otherwise 

and 

E (Q \ — if vertex i and j have a common neighbor Uk &U 'm G{u,i) 
|0 otherwise 



Thus, many existing community detection algorithms can be applied to Qu 
and Qi accordingly. Although this projection approach is simple and intuitive, 
it may suffer from the loss of information problem. In general, the real world 
bipartite network G(u.i) is a large sparse graph. However, the generated graph 
Gu and Gi may become very dense as a result of the projection. In Fig. la, UO, 
Ul, U2, and U3 have a common neighbor 10, so they form a 4-clique (complete 
graph) in Fig. lb by projection. Similarly, we can also obtain the same 4-clique 
from Fig.lc. However, it is easy to see that in Fig.lc, there exists a more closer 
relationship among Ul, U2, and U3 for that they all have connections with both 
10 and II. Yet, in Fig. lb, the four nodes are indistinguishably equivalent with 
each other. This problem is very common in real life networks. For example, 
in collaborative recommendation network, a very popular film can be rated by 




Fig. 1. One- mode projection 

hundreds of users just like the scenario shown in Fig. la. If we project the original 
graph into the network consisting of all the users, it will contain a huge clique 
formed by these hundreds of users. As a result, due to the existence of many 
superfluous edges generated by the one-mode projection, the truly meaningful 
information may be overwhelmed by the high link density. 

Consequently, the main contributions of this paper concentrate on mining 
overlapping communities directly on the bipartite networks. We would like to 
answer the questions like what groups of people are interested in what types 
of products, or what cycles of scientists prefer to collaborate in what kind of 
research areas. The rest of the paper is thus organized as follows: in section 
2, we mainly review some related work. Section 3 describes the overlapping 
community detection algorithm BiTector in details. Experimental results and 
analysis are presented in section 4; and we conclude the paper in section 5. 

2 Related Work 

One of the classic approaches for detecting community structures in unipartite 
networks is the GN algorithm[12] that introduces a network modularity metric 
and optimizes it globally to find the non-overlapping communities. Guimera[lS\ 
et al. generalizes this modularity metric to the bipartite networks. They first 
differentiate the two parts of the network as the actors and teams, and then 
formulate the bipartite modularity from the groups of actors that are closely 
interconnected based on joint participation in many teams. Given vertex Vi and 
Vj , the bipartite modularity is defined as the cumulative deviation of the num- 
ber of the actual teams where Vi and Vj have been involved from the random 
expectation. Similarly, Barber[lA\ defines the bipartite modularity matrix B as 
an extension of Newman's recent work[15]. Some key properties of the eigen- 
spectrum of B are identified and used to specialize Newman's matrix-based 
algorithms to bipartite networks. 



In parallel, Lehmann[16] et al. extend the fc-clique community definition from 
Pond's work[6]. They define a Ka_b bic;lique c;oniniunity as a union of all Ka^b 
bicliques that can be reached from each other through a scries of adjacent K^^b 
bicliques, where a and h are the vertices' number belonging to the two different 
vertex sets respectively. Just like Palla's work, two Ka^b bicliques are to be 
adjacent if their overlap is at least a Ka-i,b-i biclique. 

To sum up, the modularity-based algorithms, like GN with O(m^) time com- 
plexity (to is the number of edges), are designed to find the non-overlapping com- 
munities and often have the eSiciency problem which makes them unsuitable to 
the large-scale networks in practical scenarios. Moreover, the modularity opti- 
mization strategy may introduce a resolution limit [17] as well. For LehrvMnn's 
algorithm, since it extends from Palla's work, the required user input value k, 
the lower and upper limit value of the community size, often put a significant 
iinpac;t on the discovered communities, and are uneasy to be determined before 
the algorithm can run. In addition, vertices that are not included in any Kafi 
bicliques will be ignored, so the set of all the detected communities usually can 
not cover all the vertices of the original graph. 

Therefore, to overcome these shortages, we propose BiTector by a local opti- 
mization strategy, which does not suffer from the resolution problem, and does 
not require any priori knowledge about the community's number or other related 
thresholds to assess the community structure. As of this writing, BiTector is the 
first method that can handle bipartite networks consisting of millions of nodes 
and edges. 

3 BiClique-based Overlapping Community Detection 
Algorithm 

Instead of dividing a network into its most loosely connected parts, BiTector 
identifies the communities based on the most densely connected parts, namely, 
the bicliques. We treat each group of highly overlapping maximal bicliques as 
the clustering cores. Surrounding each core, we build up the communities in an 
gradually expanding way according to certain metrics until each vertex in the 
network belongs to at least one community. 

3.1 Notations and Definitions 

In this paper, we consider simple and connected graphs only, i.e., the graphs 
without self-loops or multi-edges. Given graph Q{u,i), where U and / or Ug and 

Vg are the sets of the two different types of nodes, V{G(u,i)) = Ug U Ig and 
E{G(u,i)) denote the sets of all its vertices and edges respectively. 

Definition 1. Given sub-bigraph 5'([/,/),C/s Q Ug,Is Q Ig, if^Ui G Us,Vj € Is, 
^^iui,vj) G -B(S'([/_/)), then S'([/j) is a biclique. If there is no any other biclique 
S'^u ly such that Us C Us' and Is C Is', '^'(c/,/) called the maximal bicliques. 



Definition 2. For a given vertex v, N{v) = {u\{v,u) G E{G)}, we call N{v) is 
the neighbor set of v. For sub-bigraph S^^uj^, N{Us) = [JN{ui) — IsjUi G Us, 
and N{Is) = \JN{vj) - Us,Vj € Is, NiUs) U N{Is) is called the neighbor set 
of S(uj)- 

Definition 3. M^{Q(uj)) denotes the set of all maximal bicliques S((7,/) {\Ub\ > 
2, > 2) in G(u.i)- Given vertex Vi G ^(^((7,/)); 'S(uj) C M{G(u.i)) is the set 
of all maximal bicliques that contain Vi. The set of all B{vi) is denoted as B. 
For any pair of sub-bigraph Gi and Gj, a Closeness Function isClose{Gi,Gj) 
is defined and implemented in the next section to identify whether they could 
be merged together by quantifying how "close" they actually can be. Given any 
two maximal bicliques 3^,3^ G B{vi),\UB^\ > Gm o^^d Gn '""e the bi- 

subgraphs induced on Bm and Bn respectively. If isClose{Gm,Gn) returns true, 
we say _B„ is contained by B„i, denoted by Bn < _B,„. If B,,, is not contained by 
any other maximal bicliques in B{vi), B^ is called the core and the set of all 
cores is denoted by C. 

Definition 4. Let So,Si,...,Sn-i be the sub-bigraph of G(u,i) such that V{Sq)\J,..., 
V{Sn-i) = V{G(u,i))- For any pair of Si and Sj, if\E{Si)\ > \Ebetween{Si,Sj)\, 
Si is defined as the community ofG{u,i)- 

3.2 Algorithm 

BiTector first enumerates all maximal bicliques in G(u,i)- Because a maximal 
biclique is a complete sub-bigraph, it is thus the densest community structure 
which can represent the closest relationship between the two types of vertices 
in the given network. Given two sub-bigraphs Gi and Gj , the basic idea of the 
closeness function isClose{Gi, Gj) depends on the link pattern between Gi and Gj 
to quantify the influence that they put on each other. Wc use Ajj to denote the 
common vertices between Ug. and Ug. , and Fij to denote the common vertices 
between Ig. and Ig. accordingly. The left sub-bigraph of Gi is then defined as 
Ci with Uc^ — Ug- — Aij and 1^ = Ig^ — Fij. Similarly, the left sub-bigraph of 
Gj is denoted as Cj with Uc, = Ug. — Aij and Ic^ = Ig^ — Fij. We define the 
sub-bigraphs induced on U^ U 1^^, and U^^ U /£. are and ^/(c/^.,/^.) 

accordingly. Here the influcnc;e that Gi puts on Gj is defined based on Ug^. It is 
equivalent if we start from Ig^ . 

infij = \E{G^Uc„ic,))\ + \E{GiUc,,ic,))\ - 

It is apparent that for Gi and Gj, infij actually reflects the number of edges 
between them minus that of Gj^s inner edges. If both infij > and infji > 0, 
then Gi and Gj should be merged together as a single graph; otherwise, they 
will be separated apart. The implementation of isClose{Gi,Gj) is formulated in 
Algorithm 1. 

To make things more concrete, an illustrated example is given on the network 
shown in Fig.2. There are two sub-bigraphs: Go cycled by red dashed-line, and 



Gi cycled by green dashed-line. Ug^ = {[/q, ?7i, C/2}. Ig^ = {Io,h,l2}- Ug^ = 
{[/3, U4}. Ig, = {4, h}- ^nfoi = 2 + 2 - 6 = -2 < 0, and zn/io = 2 + 2 - 8 = 
—4 < 0, so Qq and Qi should not be merged together. Starting from M{G{u.i))j 




Fig. 2. Example of Core Formation 



we first find B{ui) £ B for every vertex Ui € Q(u,i)- Because every maximal 
biclique in B{ui) corresponds to one group of vertices in Ug which together 
with Ui are closely interconnected based on the jointly connections with certain 
cluster of vertices in Ig, B{ui) covers all the densest communities where has 
participated. yui,Uj e Ug, Gi and Gj represent the sub-bigraphs induced on 
B{ui) and B{uj). If isClose{Gi,Gj) returns true, which means all or most of 
Uj's relationships are covered by those of Ui, uj should thus stay in the same 
community with Ui . We rearrange the elements of B according to the descending 
order of Let \B{uk)\ be the element of B whose size is the largest. We 

put \B{uk) \ to set Tl and removed it from B. All the other elements contained by 
|;B(u/c)| are also removed from B. Again, we pick the next largest element of B, 
put it toH, removed it as well as those elements it contains from B. The process 
is continued until B is empty, so set Ti. stores the elements being independent of 
each other. 



Algorithm 1 isClose{Gi,Gj) 
1: { Gl Gi or Qj with the larger set of U , Qs Gi or Qj with the smaller set of U} 

2: {A = Ug^nUg,,r = ig^nig,} 

3: if UiGs) C U^Gl) or I{Gs) C I{Gl)) or (I{Gl) C I{Gs)) then 
4: return true 
5: else 

6: CL = \E{GiA.r))\-\E{CL)\ 
7: Cs = \E{G(A,n)\ - 

8: if {Cl > 0) or {Cs > 0) or (m/is ■ infsL > 0) then 

9: return true 
10: end if 
11: end if 
12: return false 



Fig. 3. Example of Core Formation 



In general, the distribution of the vertex degree in bipartite networks con- 
forms to a power-law. It is common that a few vertices in Ug have connections 
with nearly all vertices in /g. As a result, these vertices can appear in lots of 
maximal bicliques repeatedly. For example, in Fig. 3, we can see that B(UO) = 
{{L/0, L/1, U2, 10, II, 12}, {UO, U3, U4, 13, /4}}, and UO has connections with all 
the vertices from 70 to 74. However, it is obvious that {[/O, C/1, [/2, 70, 71, 72} 
and {C/0, C/3, t/4, 73, 74} are two different communities with a overlapping ver- 
tex UO. Consequently, to address this problem, given Ui and B{ui), we need to 
further refine B{ui) into several sub-bigraphs representing different communities 
that Ui has taken part in simultaneously. Therefore, for each element 77^ G 7i, 
every maximal biclique Bm G 77^ is sorted by the descending order of Ub„^- 
Given Bm,Bn with > |C^s„|i if 'isClose{Gm,Gn) returns true, 73„ is thus 

contained by B^. If B^ is not contained by any other elements in 77^, B^ is 
regarded as the core, and will be put in set C. This process is continued un- 
til every element in Ti has been refined. The whole procedure is described in 
algorithm 2. 

Clustering Once all the cores have been detected, we carry out a clustering 
process to associate the left vertices to their "closest" cores. For each sub-bigraph 
Gi induced on Ci G C, we gradually expand Gi by adding the vertices in set 
^{Ugi)^N{Ig.). Given vertex Vwi S V{G{u,i)) andVCj e C, the distance between 
Vi and Gj is defined as follows: 



As a consequence, Vi is assigned to the cores with the maximum distance value. 
Since that any vertex might have the same maximum distance value with more 
than one core, Vi can thus be assigned to multiple cores simultaneously. 

Since that for vertex Vi, it actually does not have connections with all the 
cores in C. Therefore, we adopt a coloring strategy to reduce the computation 
cost. First, the vertices covered by all cores in C are colored as old. We use set 



I^K)n7gJ 
I^(«.)u7ej 
\N{vi)c^UQ^ 

\N{v,)\JUg^ 



, Vi e Ug 
,Vi e Ig 



Algorithm 2 CoreFormation{M{G{u,i))) 
1: C <^ 0, 7^ <^ 0, contained <^ 

2: Got B from A1(t7((7,/)) and sort B{ui) € B hy the descending order of 

3: for Ve(uO e Z? do ' 

4: if B{ui) ^ contained then 

5: contained B{uj), if isClose{Qi,Qj) returns true 

6: add to W 

7: end if 

8: end for 

9: for £ H do 
10: contained 
11: for VBm € Hk do 
12: if ^ contained then 
13: contained <= Bn, Bn < Bm 

14: add Bm. to C 

15: end if 
16: end for 
17: end for 
18: return C 



Uc Q Ug and Ic C Ig to store the two types of vertices covered by C. Next, every 
new vertex in N{Uc) and N{Ic) is assigned to its closest cores, and colored as old. 
As a result, every core is now expanded. Again, starting from N{Uc) and N{Ic), 
all new vertices that have not been colored in N{N{Uc)) and N{N{Ic)) are going 
to be assigned and colored. The clustering continues until all the vertices of the 
network are colored as old. In the end, let C denote the set of every expanded 
core. We use the same process as the Core Formation to compare the closeness 
between C- and Cj G C. If isClose{Q[, Gj) returns true, C[ and is merged 
together. The whole process is presented in algorithm 3. 

3.3 Complexity 

Like the classic maximal clique problem in unipartite network, the enumera- 
tion of all maximal bicliques is a NP problem as well. However, for most real 
world bipartite networks, they are often large sparse graphs(A^ = |y(5)|,M = 
\E{Q)\,N K, M), and there exist modern algorithms that are very efficient on 
sparse graphs. Because the enumeration of maximal bicliques is equivalent to 
the Closed Item Set problem, we use the LCM (Linear time Closed itemset 
Miner) [18] to mine all the maximal bicliques. On sparse graphs, the compu- 
tational complexity of LCM is almost proportional to 0{M). The calculation 
of set H costs 0{N). Let Sc be the maximum size of B{ui) G B. It costs 
0(|W| X Sc) to calculate the core set C. In the end, the clustering process 
costs 0{N X |C|). Because on sparse bipartite networks \M{G{u.i))\ ~ ~ M, 
\n\ <^N, Sc < \C\ < \M{g(ui))\, the total complexity of BiTector is therefore 
0(M2). 



Algorithm 3 Clustering{C) 
1: for dec do 

2: Vvk € V{Qci) is marked as old 
3: end for 

4: Ue.p ^ N{Ic), Iexp ^ N{Uc) 

5: while not all vertices in V(Q(u,i)) o,re colored do 

6: for Vwi £ Uexp U I Exp do 

7: if iij is not colored then 

8: assign Vi to its closest core, and color Vi as old 

9: end if 
10: end for 
11: U'exp ^ 0, I'exp ^ 

12: add Vj to U'exp, if V% G N{Iexp) and iij is not colored 
13: add to I'exp^ if Vwfc € N{Uexp) and Wfe is not colored 
14: (7£;a;p U'exp, Iexp <= I'exp 
15: end while 

16: sort Core according to descending order of |?7g(-,, \ ,Ci € Core 

17: for Ci G C do 

18: if Ci is not merged then 

19: Cj is merged to Cj, \i isClose{Qi,Qj) returns true. 
20: end if 
21: end for 



4 Experimental Results 

III this section, we will present the experimental rcsuhs and analysis on several 
real, large bipartite networks from different domains. All experiments are done 
on a single PC (3.0GHz processor with 2Gbytcs of main memory on Linux ASS 
OS). The execution time of BiTector includes both of the biclique finding time 
and the community detection time. The experimental results are shown in 
Table 1. 



Table 1. Experimental Results 



Graph 


Vertices \V 


Edges \E\ 


Time(s) 


DAVIS SOUTHERN CLUB W0MEN[19] 


32 


93 


0.5 


NATION-SPORT NETWORK OF OLYMPIC GAMES[20] 


515 


208 


1 


CUSTOMER-PRODUCT NETWORK [21] 


2008 


3258 


2 


PROTEIN INTERACTION NETWORK OF YEAST[22] 


3,740 


4,480 


2 


AUTHOR-PAPER NETWORK OF arXiv[23] 


20,454 


24,154 


6 


MOVIE-RATING NETWORK OF NETFLIX[24] 


75,179 


100,000 


92 


BOOK-RATING NETW0RK[25] 


263,804 


433,695 


4,028 


IMDB NETW0RK[22] 


289,435 


637,035 


4,312 



DAVIS SOUTHERN CLUB WOMEN . The Southern women data set describes the par- 
ticipation of 18 women in 14 social events. The women and social events consti- 
tute a bipartite network; an edge exists between a woman and a social event if 
the woman was in attendance at the event. This data set have been much stud- 
ied by Davis as part of an extensive study of class and race in the Deep South. 
BiTector finds 4 overlapping communities shown in Fig. 4. Each community is 



circled by one colored dashed-line, and the overlapping vertices are colored by 
Black. It is apparent that E8 and E9 are two very famous clubs attracting 9 
women to join. Similarly, Barber^s algorithm also gets 4 separated communities, 
while Guimera^s method finds two coarse ones. 

CUSTOMER-PRODUCT NETWORK is derived from the purchase data of Gazelle. com, 
a legwear and legcare web retailer that closed their online store on 8/18/2000. 
An edge connects a customer to the products he or she has ordered. 

PROTEIN INTERACTION NETWORK OF YEAST contains two types of proteins. One 
represents the bait proteins and the other represents the prey proteins. An edge 
links a prey protein to a bait protein if the prey protein binds to the bait one. 

AUTHOR-PAPER NETWORK OF arXiv presents the relationships among authors 
and papers. An edge links an author to a paper if this author has published 
the paper before. Each discovered community in this network can intuitively 
links certain experts to their research areas that are reflected by the published 
papers on which they have once collaborated. Fig. 5 describes one community 




Fig. 4. SOUTHERN CLUB WOMEN 



where Prof. M.E.J. Newman has been involved. Newman has proposed the clas- 
sic GN algorithm [12] for community detection in unipartite networks, and the 
community detected by BiTector in Fig. 5 can directly finds one of the circles 
where he has been often involved in the physics society. 
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Fig. 5. Experts with Research Areas 



MDVIE-RATING NETWORK OF NETFLIX is composed of users and their rated movies 
Netflix provides an evaluation mechanism that enables users to rate movies from 
score to score 10 to express their preferences. There exists an edge between a 
user and a movie if this user has rated the movie. In our experiment, we build 
the network from Netflix's rating data in 2006. 



BOOK-RATING NETWORK is built from the Book-Crossing community. In our ex- 
periment, there exists en edge between a user and a book if this user has given 
a non-zero rating score to the book. 



IMDB NETWORK is composed of actors and movies. A link connects an actor or 
actress to a movie he or she has once starred. 



In the experiments, except for the DAVIS SOUTHERN CLUB WOMEN, both Barber's 
and Guimera's algorithms are not suitable to run on the other datasets within 
the acceptable time. For Lehmann's algorithm, since the discovered communities 
depend on the user input value k, and the required lower bound and upper 
bound of the community size, we do not include the correspondent results here. 
We further evaluate the homogeneity of BiTector's discovered communities by 
comparing them with their counterparts in the random bipartite networks. For 
any discovered community C(y/), we first randomly choose \Uc\ vertices from 
Ug into set Then from the union neighbor set of the chosen vertices, we 
further randomly choose \Ic\ vertices into set Ir. As a consequence, we obtain a 
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Fig. 6. Communities' Homogeneity 

randomly generated community R(u,i) having the same size with C^ujy In Fig. 6, 
each symbol corresponds to the average number of the inner edges for a given 
community size, n<reai>, divided by the same quantity found in random sets, 
n<:rand>- Wc Can see that the n^j.eai> /n'<rand> ratio is significantly larger than 
1, indicating that the communities discovered by BiTector tend to contain closely 
interrelated entities, a homogeneity that supports the validity and effectiveness 
of the discovered communities. 

NATION-SPDRT NETWORK OF OLYMPIC GAMES . Besides the bipartite networks we 
just discussed above, BiTector is further challenged on the networks of Olympic 
Games in Summer from 1896 to 2004. In each year, we build the network ac- 
cording the relationships between nations and the correspondent sports. An edge 
links a nation and a specific sport if the nation has won medals in that sport. 
There are totally 25 networks being built from 1896 to 2004. The average number 
of nodes and edges are 515 and 208 respectively. 

Each discovered community directly represents certain group of sports in 
which a few nations often compete with each other. For example. Fig. 7 depicts 
the sports in which China has won medals as well as the correspondent compet- 
itive nations in the year 2004. Each community that China has been involved 
are marked as different colors. It is very intuitive that in the sports such as 
TableTennis, and Badminton, KOR is a strong competitor, while in Swimming 
and Diving, China has to compete with USA and AUS. 





Fig. 8. USA in Olympic Games 2004 



By contrast, Fig. 8 presents the sports in which USA has won medals in the 
year 2004. It is apparent that most of USA's advantage sports concentrate on 
swimming, Athletics, as well as Gymnastics with its major competitors such as 
AUS in swimming, ROU in Gymnastics, and ITA in Athletics. 
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Fig. 9. CHN vs. USA in Olympic Games From 1984 to 2004 



Given the set of communities at time t, Ct, for any community CI (z Ct, if 
there exists at least one community Cj_^_i G Ct+i, such that 

we say C^^^ e Ct+i is the descender of G Cj, and Q G Ct evolves to C^_^_l G Cj+i. 
In our experiments, the empirical value of / on the Olympic data is set to 0.1. 
Fig. 9 depicts the evolving trace of one community where CHN competes with 
USA in the sports of Diving and ArtisticGymnastics from 1984 to 2004. We 
see that although CHN has competed with USA in Diving :10mplatform Women 



continuously for 4 Olympic Games, USA has still been keeping its advantage in 
water sports steadily. 

5 Conclusion 

In this paper, we have proposed a new method BiTector for efficient overlap- 
ping community identification in large-scale bipartitic networks. Wc have demon- 
strated the effectiveness and efficiency of BiTector over a number of real networks 
coming from disparate domains whose structures are otherwise difficult to un- 
derstand. Experimental results show that this algorithm can extract meaningful 
communities that axe agreed with both of the objective facts and our intuitions. 
BiTector avoids loss of essential information caused by the one-mode projection 
approach and the thresholding procedures, and is expected to be of great help 
in many practical scenarios. 

Acknowledgments. We thank Xin Yang for the collection of the Olympic 
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